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PREFACE 


The  research  summarized  in  this  Memorandum  provides  a  method  for 
estimating  the  proportions  of  items  in  each  of  several  categories, 
based  upon  an  item-by-item  classification  in  which  many  items  may  be 
misclassif ied. 

The  motivation  for  the  work  had  its  genesis  in  a  desire  to 
compensate  for  incorrect  answers  that  might  be  found  in  prisoner-of- 
war  interviews.  In  that  context,  the  items  being  classified  are 
subjects  in  an  interview  (interrogation),  and  the  misclassif ication 
takes  place  when  the  subject  either  deliberately  lies,  or  for  some 
other  reason,  his  answer  does  not  correspond  to  the  facts. 

The  technique  developed  is  statistical,  and  may  be  applied  to 
a  wide  variety  of  problems,  both  military  and  nonmilitary.  In  such 
problems,  it  is  desired  to  determine  the  characteristics  of  a  group 
of  people  or  items  in  which  large  scale  misclassif ication  is  inher¬ 
ently  a  factor.  The  research  should  be  of  interest  to  planners  and 
evaluators  of  military  surveys,  directors  of  counterinsurgency  pro¬ 
grams,  and  operations 

The  author,  a  consultant  to  the  RAND  Corporation,  is  Associate 
Professor  at  the  Graduate  School  of  Business,  University  of  Chicago. 
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SUMMARY 


The  problem  examined  in  this  Memorandum  is  that  of  estimating  the 
category  probabilities  for  items  which  may  have  been  misclassified. 

A  specific  case  of  interest  is  that  in  which  the  items  are  subjects 
being  interviewed  and  the  subjects  may  be  hostile.  A  hostile  subject 
is  one  whose  response  to  a  question  posed  in  an  interview  does  not 
correspond  to  the  true  or  factual  situation.  Although  some  surveys 
contain  very  few  hostile  subjects,  others  contain  hostility  as  an 
inherent  factor,  such  as  those  in  which  the  members  of  a  surveyed 
group  have  reason  to  deliberately  mislead  the  interviewer.  Of  course 
the  misclassified  items  could  equally  well  be  diagnosed  patients  in 
a  hospital,  accused  parties  in  a  court,  or  any  one  of  many  possible 
constructs.  The  theory  is  developed  in  a  subj ect -interview  context, 
however,  in  order  to  be  specific. 

The  procedure  recommended  for  this  problem  requires  that  an 
assessment  be  made,  for  each  subject,  of  the  probability  that  that 
subject  is  hostile.  These  probabilities  are  then  combined  with  the 
actual  responses  to  yield  maximum  likelihood  estimators  of  the  param¬ 
eters.  The  problem  is  reduced  to  one  of  concave  programming  with  a 
logarithmic  objective  function.  Efficiency  of  the  estimators  is 
discussed  in  terms  of  their  variances.  A  Bayesian  approach  to 
evaluation  of  the  misclassif ication  probabilities  (or  hostility 
probabilities)  is  presented  and  an  opposition  strategy  is  offered. 
Numerical  examples  are  given  to  illustrate  application  of  the  pro¬ 
cedure  and  the  effect  of  ignoring  the  misclassified  items  problem.  . 
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I.  INTRODUCTION 


In  sampling  from  a  human  population  it  is  usually  assumed  that 
the  interviewee  is  cooperative  and  that  his  responses  to  questions 
correspond  to  the  true  situation  with  respect  to  this  individual. 

Thus,  in  a  sample  of  size  n  with  individuals  who  claim  to  have  a 
given  characteristic,  S^/n  is  the  maximum  likelihood  estimator  of 
the  population  proportion  corresponding  to  the  given  characteristic. 
If,  however,  some  of  the  people  questioned  are  hostile  to  the  inter¬ 
viewer  in  that  their  responses  are  false  for  some  reason,  S^/n  is  no 
longer  a  reasonable  estimator,  and  a  new  estimation  procedure  must 
be  found. 

In  this  Memorandum  procedures  are  developed  for  estimating  the 
true  population  proportions  in  each  category  vis-a-vis  a  noncoopera¬ 
tive  group  of  interviewees.  The  two-category  and  the  multicategory 
response  cases  are  treated  separately,  owing  to  the  intrinsic  interest 
of  the  two-category  response  case.  Maximum  likelihood  estimators  are 
developed  for  both  cases,  and  it  will  be  seen  that  to  evaluate  the 
estimator  explicitly  for  a  sample  of  n  subjects  and  r  categories 
requires  solution  of  a  simple  concave  programming  problem  involving 
a  logarithmic  objective  function  in  variables  confined  to  the  unit 
interval.  Standard  gradient  methods  for  solving  concave  programming 
problems  like  this  one  are  already  available.  For  a  survey  containing 
many  questions,  the  estimators  would  be  evaluated  for  each  question 
separately,  and  in  the  following  it  will  be  assumed  that  the  analysis 
applies  to  just  a  single  question.  An  outline  will  be  given  for  gener 
alizing  the  analysis  to  consideration  of  many  questions  simultaneously 
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An  approximate  method  for  estimating  the  category  proportions, 
which  yields  results  rapidly  for  two  categories,  is  outlined  in 
Sec.  IV. 

The  analysis  to  be  developed  is  couched  in  the  subject-interview 
framework  only  to  be  specific.  In  fact,  the  reader  may  equally  well 
interpret  the  problem  as  one  involving  n  items  to  be  classified  into 
one  of  r  mutually  exclusive  categories,  and  although  the  classifier 
is  not  certain  as  to  how  to  classify  perfectly,  he  can  assign  proba¬ 
bilities  of  correctly  classifying  each  of  the  items. 

Previous  investigations  of  this  type  of  problem  have  been  mostly 
concerned  with  methods  of  maximizing  the  number  of  cooperative  inter¬ 
viewees.  For  example,  a  recent  recommendation  for  increasing  coopera¬ 
tion  made  by  Warner involves  inducing  the  subject  to  be  truthful 
by  convincing  him  he  is  responding  only  according  to  a  probabilistic 
mechanism. 

In  some  situations  there  might  be  available  a  priori  information 

on  the  probability  distribution  of  the  population  proportion  parameter 

for  each  of  the  categories.  In  such  cases  the  Bayesian  approach  sug- 

(2) 

gested  by  Hendricks  might  reduce  the  error  bias  caused  by  the 
unreliable  data. 

(3) 

In  a  different  line  of  approach,  Mote  and  Anderson'  studied  the 
effect  of  errors  in  classifying  the  subject  on  the  usual  chi-square 
tests  of  hypotheses.  They  found  that,  if  the  errors  are  ignored, 
the  test  size  will  increase  and  the  asymptotic  power  will  be  reduced. 
Except  for  some  special  cases,  however,  the  problem  cannot  be  solved 
without  knowledge  of  the  misclassif ication  probabilities. 
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The  situation  considered  here  assumes  that  it  is  possible  for 
an  assessment  to  be  made  of  the  Reliability"  of  the  interviewee’s 
responses  (a  Bayesian  interpretation  is  given  in  Sec.  IV).  That  is, 
it  is  assumed  that  it  is  possible  to  establish  the  probability  that 
the  subject’s  answer  is  correct.  This  information  is  then  incorporated 
into  the  estimation  procedure.  It  should  be  noted  that  the  analysis 
to  follow  does  not  depend  upon  the  reason  the  subject’s  response  does 
not  correspond  to  the  true  situation.  Some  subjects,  of  course,  may 
deliberately  lie.  However,  the  responses  of  others  may  not  correspond 
to  the  facts  because  of  indigenous  cultural  differences  between  the 
interviewer  and  interviewee,  psychological  problems  of  the  interviewee, 
semantic  difficulties,  etc.  In  the  future,  the  terms  "truth  telling" 
and  "lying"  will  be  used  to  denote  the  extremes  of  discrepancy  between 
the  subject’s  response  and  the  true  situation.  However,  the  terms 
should  be  interpreted  in  the  general  sense  described  above.  (In  case 
the  context  were  not  that  of  subjects  being  interviewed,  the  "items" 
might  be  misclassif ied  for  a  wide  variety  of  reasons,  depending  on 
the  specific  case  at  hand.) 

Estimates  obtained  from  sample  surveys  usually  contain  some  bias 
that  can  be  associated  directly  with  data  obtained  from  subjects  who 
"lie,"  i.e.,  hostile  subjects.  The  effect  is  of  course  small  if  the 
degree  of  hostility  in  the  survey  is  proportionately  small.  However, 
in  certain  marketing,  advertising,  and  voter  preference  surveys, 
hostility  to  the  interviewer  is  inherent,  and  is  therefore  too  large 


a  factor  to  be  ignored. 
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Section  II  contains  the  development  of  the  maximum  likelihood 
equations  for  various  special  cases  of  interest.  Section  III  provides 
a  discussion  of  the  efficiency  of  the  estimators  for  this  problem  and 
illustrates  the  estimation  procedures  with  numerical  examples.  Sec¬ 
tion  IV  contains  an  analysis  of  the  considerations  that  surround  the 
problem  of  assessing  the  misclassif ication  probabilities  (and  of 
ignoring  them),  and  the  effect  of  assessment  errors  on  the  results. 
Section  V  considers  the  problem  of  generalizing  the  analysis  to  include 
simultaneous  evaluation  of  many  questions.  Finally,  Sec.  VI  examines 
the  problem  from  an  entropy-information  standpoint,  and  suggests  an 
optimal  strategy  for  hostile  groups. 
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II.  MAXIMUM  LIKELIHOOD  ESTIMATION 


In  this  section  the  notation,  terminology,  and  general  model 
adopted  throughout  are  introduced  and  the  maximum  likelihood  estima¬ 
tion  equations  are  developed.  It  is  generally  assumed  that  there  are 
n  subjects  interviewed,  any  or  all  of  whom  may  be  hostile. 


MULTICATEGORY  RESPONSE  CASE 

Each  question  is  phrased  so  that  the  response  of  every  subject 
can  be  placed  into  one  of  r  mutually  exclusive  and  exhaustive  cate¬ 
gories.  The  interviewee  is  assigned  a  probability  that  his  response 
is  truthful  (this  assignment  is  based  upon  collateral  information 
using  procedures  discussed  in  Sec.  VI).  Let 


{1,  if  the  jth  subject  actually  belongs  to  category  k 
0,  otherwise 


Y 


j 


i, 

0, 

1, 

0, 


if  the  jth  subject  claims  to  belong  to  category  k 
otherwise 

if  the  jth  subject  tells  the  truth 
otherwise 


for  k  =  1,  2,  . r,  r  ^  2;  and  j  =  1,  2,  . n. 

Define  the  r-dimensional  unit  vectors  fj,j  =  (1,  0,  ...,  0),  ..., 

p/  =  (0,  ...,  0,  1).  Let  the  result  of  the  jth  interview  be  denoted 

by  the  unit  vector  Z!  -  (Z-  .  ,  ...,  Z  ,),  and  denote  the  true  charac- 
j  j  lj  rj 

terization  by  the  vector  X!  s  (X-  .  ,  ...,  X  .).  Also,  we  will  use 

J  lj  rj 

lower  case  x's  and  z!s  to  denote  observed  values.  The  assumptions 
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of  the  model  may  be  summarized  as 


(a)  P{X.  =  =  pk,  k  =  1,  r 

where  the  are  unknown  parameters  satisfying  0  ^  p^_  ^  1  ,  ^.=1  ^k  = 
and  for  convenience,  =  1  -  p^; 

(b)  PfY.  -l|Xj  .n<l> 

where  the  are  all  known,  and  of  course  P{Y^  =  0 1 =  1  - 

.CD. 


(c)  P{Z,  = 


\lYj 


1  ’  XJ 


=  1  and 


"kiYj 


0,  X. 
J 


0  ,  if  k  =  m 

_1 _ 

-  1  ,  if  k  /  in 


(d)  (Z^ ,  Z^)  are  mutually  independent  random  vectors  (Id) 


Note  that  although  Assumption  (c)  implies  that  all  categories 
are  equally  likely  to  be  selected  by  a  lying  subject,  the  extension 
to  a  nonuniformly  weighted  distribution  is  immediate.  That  is, 
we  could  replace  Assumption  (d)  by  Pfz^  =  “  0 ,  X =  jjL^]  =  Ym> 

if  k  /  in,  where  the  Y  fs  are  known  constants  satisfying  0  ^  Y  ^  1, 

r  —  1  —  t 

S-  Y  =  1.  For  simplicity,  assume  Y  =  (r  -  1) 

1  m  -  m 

Now  define 


=  P[Y.  =  l|x.  t  |x  } 
kj  J  1  J  kJ 


It  is  seen  that  since 


(0)  .  Ptb  lilVjhsj 


Ptfj  t  dkJ 
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T<0)  - 

kj 


(2b) 


for  j  =1,  .  .  .  ,  n,  k  =  1 ,  .  .  .  ,  n.  Note  from  Eq .  2b  that  when  r  =  2  , 
knowledge  of  rr^ ^  ,  n2j^  imPl*-es  knowledge  of  ,  n2j  ^  '  However, 

this  is  not  true  for  r  >  2. 

The  probability  mass  function  for  Z^  is  similar  to  that  of  a 
multinomial,  and  is  given  by 


Ptzj  ■  zj]  *  j,  K  ■ 


(3) 


with  z  j  =  (z^j  ,  .  .  .  ,  z^. )  ,  for  j  =  1 ,  .  •  .  ,  n.  To  evaluate  Eq.  3  it 

is  necessary  to  determine  the  unconditional  probability  distribution 

of  Z ..  Since 
J 


P(Zj  ■  *  P^.  =  1) 


it  is  only  necessary  to  consider  the  latter. 


P<ZkJ  ■  »  ■  fkP<Zkj  -  1  l^kj  ■  1}  +  »kp<zkj  ■  1lxkj  -  °> 


But 


p<zkj  ■  ii\J  =  °> = p(zkj  =  ii\J = °-  yj  -  i>  p<yj  -  i'xkJ  -  °> 

+  P(Zkj  =  ll^j  -  0,  Y.  =  0)  P(Y.  =  Olx^j  -  0) 


or, 


1  -  TT. 


p<\j  -‘l^j  “0)  ‘T-TT 


(0) 

111 
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Similarly 


,  „a> 


p<\j  -  lKi  -  -  % 


Hence 


.  0.)  \('1  "  "ki^ 

p(zkj  -  1)  =  Pk^/  +  r  .  i 


(4) 


Substituting  Eq.  4  into  Eq.  3  gives 


p{zi  =  zj}  =  TI 

J  J  m=l 


(i)  Sn*1 

TT  .  H - 


m  mj 


r  -  1 


mj 


for  j  =1,  2,  .  n.  Thus,  if  L  denotes  the  logarithm  of  the  like¬ 
lihood  function  L(z^ ,  .  2nlp^>  • •• ,  pr)  , 


L  =  J0n  L  =  ^  In  P{z.  =  z .} 
j=i  J  J 


or 


n  r 


L  =  E  E  !,j  '» 


j  =1  m=l 


P  T,<1>  +  ’,"(1 
m  mj 


1  - 

_ EJ _ 

r  -  1 


r  - 


Now  transform  the  parameters  by  letting  0  =  p  ,  m  =  1,  2, 

m  m 

r  r — 1 

1.  Since  E-  p  =  l,p  =1-1!.  0.  Substitution  gives 

1  m  r  1  m 


*  11 

L*  -  £ 

j-i 


r-1 

L  - 

m=l 


mj  m  mj 


0 


in  I  0  TT^P  + 


(1  -  0  )  (1  - 

_ m  _ mi 

r-l 


+  z  .  in 
rj 


r-l  (1  - 
—  rj _ ' 


/  r_1  \  n  \  r_1  (1 

h  -  £  Onr  +  £  8  — 

l  m  J  r  j  ~  m  r 

\  m=l  /  J  m=l 


(5a) 
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Next  note  from  Eq.  4  that  since  the  response  of  each  subject  must  fall 
into  one  of  the  r  categories, 


£  ''tZkj  - 1)  -  £ 

k=l  J  k-1 


CD  \(1 

p,  n  -I - 

Hk  kj  r 


(0)  ' 
TV .  ) 


=  1 


However,  it  may  be  checked  that  since  this  result  is  implied  by  Eq.  2b, 

it  is  not  really  an  additional  constraint  on  the  likelihood  function. 

k 

Substituting  the  constraint  of  Eq.  2b  into  the  equation  for  L  yields 


n  r-1 

-  £  £ 

j=l  m=l 


z  .  in 
mj 


0  tt4^ 

m  mj 


V 


e 

m 

1 


rr-1 


n(1)9 
nij  8i 


(5b) 


The  problem  that  must  be  solved  is  that  of  finding,  for  L  de¬ 
fined  in  Eq„  5b, 


k 

max  L  (6  ,  . . .  ,  0  1  ) 

0 


subject  to  the  linear  constraints  that 


o  £  e, ,  eot  . . . ,  0  ,  l 

12  r-1 


r-1 

o  *  E  9i  *  i 

i  ■*- 


Recall  that  JLn  x  is  a  concave  function  of  x,  for  any  scalar  x. 
Hence,  it  follows  by  definition  that  JLn  (g*0  +  v)  is  concave  in 
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0  =  (0^,  0^  for  any  (g,  v) .  Noting  that  L  is  just  a  sum  of 

* 

such  functions,  L  is  concave  in  0.  Thus,  ours  is  a  concave  program¬ 
ming  problem  that  may  be  solved  (if  the  dimension  of  the  problem  is 

small  enough)  using  any  one  of  several  standard  computer  routines. 

(4) 

(See,  for  example,  Rosen  for  an  exposition  of  the  Gradient  Projec- 
tion  method.)  In  any  case,  since  the  derivatives  of  L  are  fundamental 
to  the  solution,  an  idea  of  the  magnitude  of  difficulty  involved  can 
be  obtained  by  examining  the  unconstrained  solution.  The  classical 
differentiation  approach  (neglecting  the  constraints)  yields  estimators 
which  satisfy  the  equations 


a  r-1  z  .v  ,  m  z  .w. 

£  £ - “£> - *  £  ri  (i) 

j=l  m=l  v 1 0  +  (1  -  tt  .  ) / (r  -  1)  j=l  w'6  +  tt"' 
j  m  rj/x  J  rj 


where  -  (n™  -  -  1) 

Vmm  =  "if  +  (T,rJ)  '  1)/(t  ‘  » 
Vm  ^Vml ,  * ‘ ,  Vm,  r-1^ 


”i  *  •  "r^  +  (1  '  T,ij>>/<r  -  l> 


w '  =  (w 


1, 


W  -  ) 

r-1 


H  =  1,  ...  ,  r  -  1  (6) 

&  /  m 


Examination  of  Eq.  6  shows  that  this  is  a  system  of  (r  -  1) 
equations  each  of  which  (in  general)  is  of  degree  2n(r  -  1)  -  1. 
Thus,  if  100  subjects  are  interviewed  on  a  question  with,  say,  three 
possible  responses,  Eq.  6  represents  a  system  of  two  equations,  each 
of  which  is  of  degree  399. 
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Implicit  in  Eq.  6  was  the  assumption  that  there  existed  0  1 s 

K 

maximizing  Eq.  5b,  which  lie  in  the  unit  interval.  When  such  0  1  s  do 

iC 

not  exist,  the  alternative  to  treating  the  problem  with  a  programming 

algorithm  is  to  select  the  value  zero  or  one,  whichever  yields  the 
* 

larger  L  .  This  remark  applies  also  to  the  special  cases  considered 
below. 


MULTICATEGORY  RESPONSES  WITH  INDEPENDENCE 

If  X^.  and  Yj  are  independent  random  variables,  each  subject  lies 
independently  of  the  category  he  occupies.  Then,  from  Eq.  lb  and 
Eq.  2a, 

tt<0)  =  P{Y.  =  1}  =  nfp  =  TT.  (7) 

kj  J  kj  j 


Under  these  conditions  the  constraint  of  Eq.  2b  is  trivial,  so  that 
the  likelihood  function  given  in  Eq.  5a  may  be  used  with  Eq.  7  instead 
of  that  given  in  Eq.  5b.  Without  concern  for  whether  the  0^  lie  in 
the  unit  interval,  or  sum  to  one,  conventional  differentiation  shows 
that  the  0^  satisfy  the  equation  system 


n 


E 

3-1 


1  -  TT. 

z.  .  (tt. - r1-) 

ki  v  i  r  -  1 


n 


0,  tt.  +  (1 
k  j 


V 


(1  -  tt.) 

r  -  1 


-  £ 

j=i 


si 


1  -  TT. 

(tt  • - r1) 

1  r  -  1 


r-1 


m-1 


(!  ‘  ^>1 

e  tt.  +  (i  -  e  )  - L 

m  j  m 


(8) 


r-1 


for  (rrTj  -  l)/(r  -  1)  /  0  for  some  j .  Now  each  equation  in  the  system 
is  of  degree  (2n  -  1) ,  as  will  also  be  true  for  the  remaining  cases. 

Next  examine  the  likelihood  function  in  Eq.  5a  from  which  these 
equations  were  derived.  The  concavity  argument  following  Eq.  5b  is 
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seen  to  be  applicable  again.  Hence,  L  (0^  ,  is  a  concave 

function  and  can  be  maximized  for  0  =  (0^,  ,  0^  in  the  simplex 

included  between  the  axes  and  a  hyperplane  passing  through  all  unit 
vectors  at  their  terminals  by  conventional  convex  programming  algo¬ 
rithms.  The  solution  will,  of  course,  have  the  desirable  property 
that  any  feasible  local  maximum  found  will  also  be  a  global  maximum 
(this  was  also  the  case  in  Sec.  II). 

TWO -CATEGORY  RESPONSE  CASE 

Since  many  surveys  involve  only  questions  with  two  possible 
responses,  this  special  case  is  of  particular  interest  and  is  there¬ 
fore  evaluated  separately.  Letting  r  =  2  in  Eq.  6  shows  that  in  this 
case,  the  maximum  likelihood  estimators  satisfy 


n 


f=i  6l(n<jl>  +  t£> 


(n(D  +  „<1>  -  „ 

_Jj _ 12 _ Ll _ _ 


1)  +  (1 


’O 


j-l  ex(l  -  >,<}>  +  "£>)  -  ."£> 


-  E 


(9) 


or,  if  |0^|  >  1  in  this  system,  the  solution  is  zero  or  one. 


Two-category  Response  Case  with  Independence 
Setting  k  =  1  in  Eq.  7  gives 

TT^  «  p{y.  =  1}  =  TT^  =  n. 

Ij  J  lj  j 

Then,  the  (unconstrained)  maximum  likelihood  estimator  of  0^  satisfies 
(from  Eq.  7) 


ei<2TTj 


zii^i 


i) 


i)  +  (i 


TT  )  ^ 

Y  j=l 


z0.  (2tt.  -  1) 

21  1 

6,  (1  -  2TT.)  +  TT. 
1  J  J 


(10) 
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III .  ESTIMATOR  EFFICIENCY  AND  NUMERICAL  EXAMPLES 


This  Section  develops  the  information  matrix  for  this  problem, 
for  the  case  of  independence-  Since  maximum  likelihood  estimators 
are  asymptotically  efficient  and  normally  distributed  (under  mild 
regularity  conditions,  it  is  of  interest  to  determine  the  asymptotic 
covariance  matrix  (the  inverse  of  the  information  matrix). 


INFORMATION  MATRIX 

The  lower  bounds  for  the  variances  of  any  estimators  for  this 


require  the  information  matrix 

,  J.  Let  J  =  (J^)  ,  for  k, 

. . ,  r  -  1 .  Then 

/  * 

J  =  E  j  • 

5L  \ 

km  Ue 

ae,. 

m 


where  L  is  defined  in  Eq.  5a,  for  the  case  of  independence,  by  taking 

(1)  (0) 

tt.  .  -  tl  .  =  tt.  . 

kj  kj  j 

i k 

Note  that  L  may  be  expressed  in  the  form 


*  n  {  r_1  r‘l  ) 

L  =  E  !  E  Z  ,  to  E(Z  .)  +  Z  .  in  [l  -  V  E(Z  .)]> 
"  1  mi  mj  rj  m  j  & 


j=l  {  m=l  * 


rj 


and  if  we  define 


vj  =  ^njr '  ^ ’  for  J  =  1  >  •••>  n> 


(id 


(12) 


-14- 


Hence , 


3k!  =  y 

"k  £ue<v 


it 


Ecrji)  ^ 


(13) 


Taking  expectations  in  Eq.  13  gives 


«■ 


Therefore,  the  minimum  variance  bounds  for  this  problem  exist,  and 

may  be  computed  by  evaluating  Eq.  11. 

Accordingly,  by  direct  algebraic  computation,  it  can  be  found 

that  if  6.  denotes  the  Kronecker  delta, 
km 

_  1  f  -  N  2  ( _ ^km _ 

Jkm  ‘  r  -  1  jS  <njr  '  |pk(TT.r  -  1)  +  (1  -  tt  .  ) 


Pr(lTjr  -  U  +  (!  ' 


4 


(14) 


The  diagonal  elements  of  J  ^  are,  of  course,  the  Cramer-Rao  lower 
bounds  for  the  variances  of  any  unbiased  estimators  of  the  p^'s. 

When  r  =  2,  J  reduces  to  a  single  element  and  substitution  in 


Eq.  14  shows  that  for  any  unbiased  estimator  p^  of  p^ , 


n  (2tt,  -  1  y 

Var(Pl)  £  j  I]  [Pi(2tt.  -  1)  +  (1  -V).1[tt.  -  P1(2TTj  -  1)] 


\-l 


(15) 


and  the  same  lower  bound  applies  to  Var(p  ) .  Note  that  if  tt.  =  — 

z  j  z 

for  all  j ,  Eq.  15  demands  that  p^  have  infinite  variance.  However, 
by  restricting  p^  to  the  unit  interval,  Var(p^)  will  also  be  restricted 
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to  the  unit  interval,  so  that  a  better  bound  in  this  case  is  unity. 

■ k 

Note  also  that  Eq.  5a  shows  that  L  is  then  not  dependent  upon  0^. 

This  case  is  discussed  in  Sec.  VI. 

EFFICIENCY 

Since  L(z^,  .  ..,  z  |0j.,  •••>  1^  a  regular  function  of 

* k 

Gf,  .  ..,  9r  -p  i.e.,  E(dL  =  9  for  all  k,  the  minimum  variance 

bounds  (MVB)  for  the  parameters  always  exist.  However,  in  general, 
the  bounds  cannot  be  attained  (although,  as  will  be  seen,  they  are 
attainable  in  some  cases).  A  necessary  condition  for  attainability 
of  the  MVB,  for  all  values  of  the  parameters,  is  the  existence  of  a 
sufficient  statistic  for  the  problem.  But  it  is  easy  to  check  that 
for  this  problem  one  does  not  exist  in  general.  Alternatively,  one 
might  attempt  to  find  the  Bhattacharyya  bounds.  This  approach  does 
not  appear  to  be  fruitful. 

However,  the  existence  of  the  MVB  does  provide  the  usual  standard 

for  measuring  efficiency.  Thus,  if  denotes  the  efficiency  of  the 

estimator  S,  , 
k 

Jkk  . 

s,  =  - 1—  k  =  1 ,  .  .  .  ,  r  -  1 

k  Var(§k) 

and  if  =  1,  §k  is  called  efficient. 

One  special  check  case  arises,  for  example,  when  rr^  =  a  =  constant, 
for  all  j,  and  r  =  2.  In  that  case,  the  maximum  likelihood  estimates 
actually  attain  the  MVB,  uniformly  in  p^  (or  p^)  even  for  finite  n. 

That  is,  for  any  fixed  sample  size,  the  variances  of  the  estimators 
are  at  least  as  small  as  those  of  any  other  estimators  (see  Example  1 
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below) .  The  fact  that  the  MVB  are  attainable  for  this  case  follows 
immediately  from  the  fact  that  in  that  special  situation,  there  is  a 
sufficient  statistic,  made  up  of  the  totals  of  people  who  claim  each 
of  the  separate  categories. 


NUMERICAL  EXAMPLES 

Example  1 

Assume  r  =  2  and  that  tt^  =  =  rr.  =  a  where  ot  is  a  constant 

kj  kj  j 

■ k 

for  all  j  ,  k.  When  rr^  =  or,  L  can  be  minimized  without  regard  for  the 
inequality  constraints,  which  are  then  inactive.  Substitution  in 
Eq.  10  gives  (recall  that  =  0^) 

fl  ■  (2«  -  l)n  't  (zlj  +  “  '  l) 
and  p^  =  1  -  p^.  Thus,  if  all  subjects  are  truthful,  a  =  1  and 


A 


=  z 


2 


as  expected.  Conversely,  if  all  subjects  lie,  a  -  0  and 


P2  =  n  ^  (1  '  Z/>^  =  1  -  z 
n  j=l 


2j/  -  -2 


The  variances  of  these  estimators  are  easily  found.  From  the 
estimation  result  for  p^ ,  above, 


1  1 
Var(p1 )  =  - o- 2  Var  E  (z,  .  +  a  -  1)  =  - r-  Var(z  ) 


{lot  -  1)  n  j=l 


(2a  -  1 j  n 
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But  because  P{z^  =  l}  =  p^(2ar  -  1)  +  (1  -  or)  for  this  case, 
Var(Z1  )  =  [p1(2of  -  1)  +  1  -  or] [or  -  px (2or  -  1)] 
Substitution  gives 

[p ,(2a  -  1)  +  (1  -  Qf)][a  -  Pl(2or  -  1)] 

Var(p  )  =  - - - i - 

(2a  -  1)  n 


Evaluation  of  Eq.  15  for  TT.  =  a  shows  that  the  MVB  is  identical  with 

J 

the  result  just  obtained  for  the  maximum  likelihood  estimator,  showing 
the  latter  is  efficient  in  finite  samples. 


Example  2 

Take  r 


=  2 ,  and  n  =  2m , 


TT, 


CD  _  .(0) 


-  TT, 


kj 


«  TT. 


m  =  1 , 


2,  ...  .  Assume,  moreover,  that 

j  =  1,  •  •  -  ,  m 
j  =  m  +  1 ,  . . .  ,  2m 


That  is,  half  of  the  subjects  lie  with  probability  (1  -  O'  )  and  the 
other  half  lie  with  probability  (1  -  Q^)  .  For  s^mP^i-c^t:y »  take  =  1 
and  oi^  =  0.  Then  from  Eq.  10, 


m 


j=l 


-lj 


pi  = 


2m 

+  E  (1 

j  =m+l 


V 


n 


Since 


1 

m 

1 

“2m 

Var(p1)  =  \ 
n 

Vat  E  ^ 

+  T 

n 

Var  £  (1  - 

m+1  J 
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and 


since  P{z.^  =  1 }  =  p^(2a  -  1)  +  (1  -  or)  when  tt  =  a. 


Var(z 


.)  =fl(1  "  ?1> 

j  Id  -  P^Pj^ 


j  =  1 ,  .  .  .  ,  m 
j  =  m  +  1 ,  .  .  .  ,  2m 


Hence , 


p^1  -  pt> 

Var(pi)  -  „ 


Substitution  into  Eq.  15  to  evaluate  the  MVB  gives 


m(2c^  -  1) 

Var(Pl)  &  i[p1(2Qf1  -  1)  +  1  -  a1][ar1  -  p1(2a1  -  1)] 


m(2Q'2  -  1)' 


1-1 


Lp1(2a2  -  1)  +  (1  -  a2)][or2  -  Pl(2c*2  -  1)] 


or, 


A  PiO  -  Px>  Pi^i 

Var  p  ^  - - -  =  - 

1  2m  n 


Hence,  again  in  this  case  the  maximum  likelihood  estimators  are  effi¬ 


cient  in  finite  samples. 
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IV,  ASSESSMENT  OF  HIS  CLASSIFICATION  PROBABILITIES 
AND  RELATED  EFFECTS 


The  above  estimation  procedures  have  been  developed  on  the  basis 
that  the  probabilities  that  the  subject1 s  responses  coincide  with  the 
true  situation  are  known  or  may  in  some  way  be  determined.  Indeed, 
there  are  many  situations  in  which  the  Trap’s  may  be  determined  on 
the  basis  of  collateral  information.  This  section  considers  the 
techniques  for  assessing  the  misclassif ication  probabilities  and  the 
effect  of  ignoring  hostile  subjects  in  the  analysis. 


ASSESSMENT  TECHNIQUES 

The  technique  that  should  be  used  to  assign  these  probabilities 
varies  with  the  circumstances.  Often  it  may  be  possible  to  decide 
upon  the  reliability  of  responses  to  certain  questions  on  the  basis 
of  the  subject !s  answers  to  other  questions  about  which  the  inter¬ 
viewer  has  personal  knowledge  and  additional  information.  In  some 
surveys,  the  behavior  of  the  subject  during  the  interview  might  be 
the  only  available  basis  for  a  rational  assessment,  whereas  in  severe 
circumstances  polygraph  instruments  or  drugs  might  serve  as  the  main 
bases  for  assessment.  A  quantitative  measure  that  depends  upon  the 
length  of  the  subjects  response  or  the  total  time  of  the  interview 
might  also  be  incorporated  into  the  decisionmaking  associated  with  tt_.  . 
In  the  personality  questionnaires  often  given  to  employees  or  pro¬ 
spective  employees  of  a  company,  a  certain  subset  of  ’’test  questions” 
usually  serves  to  establish  the  reliability  of  the  subject’s  responses. 
In  general,  every  effort  should  be  made  in  designing  the  questionnaire 
so  that  the  difficulty  of  assessing  the  tt^'s  is  minimized. 
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In  summary,  there  are  available  many  indicators  of  whether  or  not 
a  subject  is  falsifying  his  responses  to  an  interviewer,  and  they  may 
be  used  individually  or  in  combination  to  obtain  estimates  of  the 
truth  probabilities. 


A  BAYESIAN  APPROACH 


r(D 


A  direct  justification  for  replacing  the  tv.  parameters  by  their 

kj 

’’best  guess”  estimates  may  be  found  in  the  Bayesian  approach.  Now, 

instead  of  assuming  the  rr^  are  known  parameters  (as  assumed  in 

Eq.  lb),  assume  they  are  random  with  known  mean  values,  M^.  ,  so  that 

Ml  =  (M. . ,  .  ..,  M  .).  Define  the  r-vectors: 

J  lj  rJ 


a  = 
m 


P  -  Pm  ^m 


and  let 


1  -  P, 


^m  r  -  1 


m 


where  u,  is  the  vector  of  zeros  with  a  one  in  the  mth  place,  and 
m 

m  -  1,  ...,  r.  Then,  it  may  be  checked  that  Eq.  5b  may  be  equiva¬ 
lently  written  (as  the  likelihood  function  expressed  as  a  function  of 
the  tt^  1  s  ) 


■J<)-  •••■  -TT  TT  (-tf”  +  ».)V’ 

where  =  [^lj  ^  5  •••»  »  an< *  (am>  bm)  are  ^n^ePen<^ent  °f 

.  Since  z^  may  only  take  on  the  values  zero  and  one,  L  may  be 
written  in  the  equivalent  form 


(16) 
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L  = 


IT  t  z  .L^+h) 

j-i  A  mAm  J  m/ 


(17) 


Integrating  to  find  the  marginal  likelihood  function  gives 


L(z^  ,  •  *  *  >  z 


where  f^1*,  ....  tT1^ 


joint  density  of  ,  .  .., 


denotes  the 

Assume,  for  simplicity,  that  there  is  no  collusion  among  subjects  and 
take 


(•!" . •!■’)  •  j  ',(■!”) 


where  the  f.(TT)J'/l  are  the  densities  of  the  vectors.  Then, 

n  i  J  j 


L(z1#  z 


|P)-  TF  E  3  .ft ,.,irf1>+bV.M1>W1> 

a  j-i  A  \m  J  ■/  A J  /  j 


nr/  \ 

T  E  z  -ia'M-  +  b  1 

|-1  m=l  mj\m  J  V 


Replacing  z^  into  the  exponential  form  yields  the  marginal  likelihood 
function 


n  r 


L(z1 ,  . .  .  ,  zjp)  = 


TT  IT  ( a  'm.  +  b  \ 

1  1  I  m  j  ml 

J-l  m-1  \  / 


mj 


(18) 


Note  that  Eq.  18  is  the  same  function  of  "p"  as  is  Eq.  16,  with  the 
TTj^'s  replaced  by  their  expected  values.  Hence,  the  resulting 
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programmiag  problem  yields  the  same  estimates  of  (p^,  .  ..,  p^)  if  the 

are  replaced  by  their  expected  values,  which  are  certainly  equiva¬ 
lents  for  this  problem. 


EFFECT  OF  IGNORING  HOSTILE  SUBJECTS 

The  question  naturally  arises  of  whether  it  is  worthwhile  using 
the  type  of  analysis  recommended  in  this  Memorandum,  i.e.,  is  there 
much  saving  over  just  ignoring  the  effect  of  misclassif ied  subjects? 
The  answer  is  that  the  saving  can  be  slight,  or  it  can  be  so  large  as 
to  make  it  mandatory  to  take  some  corrective  action,  depending  upon 
the  situation.  The  effect  is  illustrated  quantitatively  below  for 
the  case  of  two  categories,  and  independence.  The  example  used  to 
evaluate  this  problem  can  be  used  also  as  an  approximate  estimation 
technique. 

Suppose  R  percent  of  the  subjects  interviewed  claim  to  belong  to 
category  one.  Then,  if  the  hostility  effect  is  ignored,  the  usual 
estimator  of  p^  (maximum  likelihood)  gives  100  p^  =  R. 

Next,  suppose  that  a  fraction,  O',  of  the  subjects  claiming 
category  one,  lie,  and  that  a  fraction,  |3 ,  of  the  subjects  claiming 
category  two,  lie.  Then,  it  is  easy  to  see  intuitively  that  an  esti¬ 
mator  which  accounts  for  the  liars  is  given  by 


100 


A 


=  R(1  -  a)  +  0(100  -  R) 


(19) 


In  fact,  exactly  this  result  is  obtained  from  Eq.  10  by  making  sub¬ 
stitutions 


z 


ij 


nR 

100 


+  1> 


n 
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TT. 

J 


o,  j  -  1, 


cmR 

100 


1>  j 


cmR 

100 


+  1, 


nR 

100 


TT. 

J 


r.  •  _  nR  ,  n  nR  .  a,  nR  N 

0 »  J  100  +  1  ’  '  '  *  ’  100  +  100) 


1’  j  100  +  100^  +  1  ’ 


n 


Now  define  e  to  be  the  absolute  error  (expressed  in  percent 
probability)  made  by  ignoring  the  effect  of  hostile  subjects.  Then 
from  Eq.  19 


* 


e 


<\ 

A 


100 


p  (100  -  R)  -  BQf 


(20) 


Examination  of  Eq.  20  shows  that  e  can  be  anything  between  0  and 
100  percent,  depending  upon  the  values  of  (O',  (3,  R) . 

For  example,  if  all  subjects  lie,  O'  =  1,  |3  =  1.  Then  from 
Eq.  20,  e  =  1 100  -  2R|.  Thus,  if  R  =  100,  e  is  100  percent,  whereas 
if  R  ==  50,  e  =  0.  All  varieties  of  intermediate  results  may  be 
obtained  by  considering  excursions  of  O',  [3,  and  R. 
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V.  MULTIPLE  QUESTION  ANALYSIS 

This  Section  is  addressed  to  the  problem  of  estimating  the  multi* 
nomial  population  proportions  for  many  questions,  simultaneously. 

This  problem  is  more  complex  than  the  single  question  case  for  two 
reasons.  One  reason  is  that  some  subjects  may  exhibit  inconsistent 
behavior,  in  that  their  truth  telling  probabilities  may  vary  over 
question  number.  The  second  reason  is  that  the  responses  of  a  given 
subject  to  many  questions  may  be  correlated.  There  will  be  a  multi¬ 
variate  probability  distribution  generated  by  the  joint  probability 
of  actually  belonging  to  category  k  for  question  number  i^,  and 
belonging  to  category  k1  for  question  number  i^,  etc.  The  problem 
is  clearly  much  more  difficult,  but  we  can  examine  what  is  involved. 

co 

Suppose  consistent  behavior  can  be  assumed.  Define  p^  1  as  the 

(i,  ) 

probability  of  belonging  to  category  k  for  question  i^  ,  and  let  1 
denote  the  value  of  for  question  i^.  We  require  simultaneous 

estimation  of  the  category  probabilities  for  each  question.  In 


particular,  it  is  desired  to  find  estimators  of  the  two-dimensional 

(i-|  y  ^2 ) 

marginal  probabilities,  Pfc  £»  *  of  falling  in  category  k  on  ques¬ 

tion  i^,  and  category  k1  on  question  i^,  and  of  the  higher  order 
marginal  probabilities,  which  are  more  complicated.  These  probabili¬ 
ties  can  be  developed  by  evaluating  the  covariance  matrix  of  the 
jointly  distributed  Z^^ - 
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VI.  A  SPECIAL  CASE  AND  AN  OPPOSITION  STRATEGY 


In  this  Section  it  is  shown  that  the  one  case  in  which  the 
analysis  cannot  be  used  is  the  one  corresponding  to  maximum  entropy 
or  information  content  in  the  set  of  responses.  The  sense  of  "infor¬ 
mation"  used  here  is  that  of  Shannon  (see,  for  example,  Khinchin^^). 
Based  upon  this  result,  statistical  inference  on  the  p^'s  hopeless, 
although  at  the  same  time,  motivation  is  provided  for  the  development 
of  an  optimal  strategy  for  hostile  subjects. 

Recall  that  the  assumption  following  Eq.  8  required  that 


rrr. 


1 


r  -  1 


t  0 


for  at  least  one  j.  Clearly,  unless  this  is  true,  Eq.  8  yields  no 
information  and  the  entire  analysis  (for  the  case  of  independence) 
breaks  down.  When  r  =  2  and  independence  applies,  the  assumption 
requires  that 

2tt.  -1^0 
J 


for  at  least  one  j;  i.e.,  it  is  required  that  there  exist  at  least 

one  j  for  which  it.  ^  1/2. 

J 

The  r  events  corresponding  to  the  jth  subject's  response  falling 
into  the  kth  category,  k  =  1,  ...,  r,  have  associated  probabilities 
(see  Eq.  4,  and  take  tt^  =  tt£?^  =  tt.) 
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Since  these  r  events  partition  the  space  of  possible  events,  the  jth 
interview  corresponds  to  an  experiment  whose  information  content  is 
defined  as 


r 


in  P.  , 
kj 


It  is  widely  known  (and  trivial  to  show)  that  Ik  is  maximized  when 

P.  .  =  l/r;  that  is,  when 
kj 


Pk 


But  this  equation  must  hold  identically  in  p^.  Therefore,  it  is 
necessary  that 


tt.  r  -  1 

-1 _ 

r  -  1 


=  0, 


1 

r 


Substitution  shows  that  these  equations  require  that 


for  all  j.  For  the  two-category  independence  case,  rr  =  —  for  all 
j  yields  the  maximum  information.  Thus,  failure  of  the  assumption 
required  in  the  analysis  corresponds  to  maximum  entropy  or  disorder 
in  the  set  of  responses  of  the  subjects. 

If  such  a  case  arose  in  practice,  maximum  likelihood  estimators 
could  not  be  used.  For  this  reason,  it  is  clear  that  if  hostile 
subjects  were  aiming  at  an  optimal  strategy,  they  would  all  lie 
independently  of  the  categories  they  occupy,  and  would  randomize  their 
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responses.  They  would  tell  the  truth  100  (l/r)  percent  of  the  time 
(when  there  are  r  possible  responses  to  a  question) ,  and  lie  half  the 
time  when  they  are  in  dichotomous  response  situations. 
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PURPOSE:  To  provide  a  statistical  method  for  estimating  the  proportions 
of  items  in  each  of  several  categories,  based  on  an  item-by-item  classi¬ 
fication  in  which  many  items  may  be  misclassified.  A  specific  case  of 
interest  is  that  in  which  the  items  are  subjects  being  interviewed  and  the 
subjects  may  give  false  responses. 

RELATED  TO:  Motivation  and  morale  studies  of  the  Viet  Cong  conducted 
by  RAND  for  ARPA  and  ISA. 

METHODOLOGY  AND  DISCUSSION:  In  sampling  from  a  human  population 
it  is  usually  assumed  that  the  interviewee  is  cooperative  and  that  his 
responses  to  questions  correspond  to  the  true  situation  as  far  as  he  is 
concerned.  Thus,  in  a  sample  size  n  with  Sn  individuals  who  claim  to 
have  a  given  characteristic,  Sn/n  is  the  maximum  likelihood  estimator  of 
the  population  proportion  corresponding  to  that  characteristic.  If  some  of 
the  people  questioned  are  hostile  to  the  interviewer  in  that  they  give  false 
responses  for  some  reason,  then  Sn/n  is  no  longer  a  reasonable  estimator, 
and  a  different  procedure  must  be  used. 

FINDINGS:  This  study  developed  maximum  likelihood  estimators  of  the 
category  proportions  for  both  the  two-category  and  the  multicategory 
response  cases  with  respect  to  a  group  of  noncooperative  interviewees. 

An  assessment  is  made,  for  each  subject,  of  the  probability  that  he  is 
hostile.  These  probabilities  are  then  combined  with  the  actual  responses  to 
yield  the  maximum  likelihood  estimators.  Explicit  evaluation  of  the 
estimators  for  a  sample  of  n  subjects  and  r  categories  requires  solution  of 
a  simple  concave  programming  problem  involving  a  logarithmic  objective 
function  in  variables  confined  to  the  unit  interval.  A  bayesian  approach  is 
used  to  evaluate  the  misclassification  (or  hostility)  probabilities.  It  is 
assumed  that  the  analysis  applies  to  a  single  question  only.  F or  a  survey 
containing  many  questions  the  estimators  would  be  evaluated  separately 
for  each  question.  Moreover,  indication  is  given  of  how  the  analysis  can 
be  generalized  to  consider  many  questions  simultaneously. 
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A  special  case  in  which  the  analysis  cannot  be  used  is  the  one0 
corresponding  to  maximum  entropy  or  information  content  in  the  set  of 
responses.  In  this  case,  if  hostile  subjects  were  aiming  at  an  optimal 
strategy,  they  would  all  lie  independently  of  the  categories  they  occupy 
and  would  randomize  their  responses:  They  would  tell  the  truth  100(l/r) 
percent  of  the  time  (when  there  are  r  possible  responses  to  a  question), 
and  lie  half  the  time  when  they  are  in  dichotomous  response  situations. 

POTENTIAL  FOR  FURTHER  DEVELOPMENT:  Press  has  laid  the 
theoretical  groundwork  for  a  reanalysis  of  many  of  the  interviews 
already  taken  from  Hoi  Chanh  and  Tu  Binh.  Certainly  some  rigorous 
empirical  testing  of  the  theoretical  estimates  ought  to  be  carried  out  to 
check  the  validity  of  Press1  findings  and  hypotheses. 

EVALUATION:  Bayesian  statistics  has  its  prestigeous  supporters  and 
its  equally  influential  detractors.  One  should  imagine  that  a  marriage  of 
the  empirical  data  from  the  Motivation  and  Morale  study  and  Mr.  Press1 
techniques  would  resolve  the  issue,  at  least  in  this  instance.  Until  that 
occurs,  one  might  only  note  that  the  approach  is  solid,  the  concepts  are 
clear,  and  the  analysis  seems  reasonable. 
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